Generalization-Based k-Anonymization
ثبت نشده
چکیده
Microaggregation is an anonymization technique consisting on partitioning the data into clusters no smaller than k elements and then replacing the whole cluster by its prototypical representant. Most of microaggregation techniques work on numerical attributes. However, many data sets are described by heterogeneous types of data, i.e., numerical and categorical attributes. In this paper we propose a new microaggregation method for achieving a compliant k-anonymous masked file for categorical microdata based on generalization. The goal is to build a generalized description satisfied by at least k domain objects and to replace these domain objects by the description. The way to construct that generalization is similar that the one used in growing decision trees. Records that cannot be generalized satisfactorily are discarded, therefore some information is lost. In the experiments we performed we prove that the new approach gives good results. Source URL: https://www.iiia.csic.es/en/node/54935 Links [1] https://www.iiia.csic.es/en/staff/eva-armengol [2] https://www.iiia.csic.es/en/staff/vicen%C3%A7-torra [3] https://www.iiia.csic.es/en/bibliography?f[author]=1638 [4] https://www.iiia.csic.es/en/bibliography?f[author]=1639 [5] https://www.iiia.csic.es/en/bibliography?f[keyword]=919 [6] https://www.iiia.csic.es/en/bibliography?f[keyword]=498
منابع مشابه
Scalable Multidimensional Anonymization Algorithm over Big Data Using Map Reduce on Public Cloud
It appears that everybody observes with special attention, the occurrence of big data and its practice. There is no disbelief that the big data uprising has instigated. Though the practices of big data propose favorable business paybacks, there are substantial privacy implications. Multidimensional generalization anonymization scheme is an actual method for data privacy preservation. Top-Down S...
متن کاملUtility-preserving anonymization for health data publishing
BACKGROUND Publishing raw electronic health records (EHRs) may be considered as a breach of the privacy of individuals because they usually contain sensitive information. A common practice for the privacy-preserving data publishing is to anonymize the data before publishing, and thus satisfy privacy models such as k-anonymity. Among various anonymization techniques, generalization is the most c...
متن کاملGeneralizations with Probability Distributions for Data Anonymization
Anonymization based privacy protection ensures that data cannot be traced to an individual. Many anonymify algorithms proposed so far made use of d~fferent value generalization techniques to satisfy d~jferent privacy constraints. This paper presents pdf-generalization merhod that empowers data value generalizations with probability distribution functions enabling the publisher to have better co...
متن کاملInformation based data anonymization for classification utility
Article history: Received 27 September 2010 Received in revised form 10 April 2011 Accepted 5 July 2011 Available online 22 July 2011 Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building classification mode...
متن کاملAnonymity: Formalisation of Privacy – k-anonymity
Microdata is the basis of statistical studies. If microdata is released, it can leak sensitive information about the participants, even if identifiers like name or social security number are removed. A proper anonymization for statistical microdata is essential. K-anonymity has been intensively discussed as a measure for anonymity in statistical data. Quasi identifiers are attributes that might...
متن کاملGeneralization-Based k-Anonymization
Microaggregation is an anonymization technique consisting on partitioning the data into clusters no smaller than k elements and then replacing the whole cluster by its prototypical representant. Most of microaggregation techniques work on numerical attributes. However, many data sets are described by heterogeneous types of data, i.e., numerical and categorical attributes. In this paper we propo...
متن کامل